BIT at SemEval-2016 Task 1: Sentence Similarity Based on Alignments and Vector with the Weight of Information Content

نویسندگان

  • Hao Wu
  • Heyan Huang
  • Wenpeng Lu
چکیده

This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ISCAS_NLP at SemEval-2016 Task 1: Sentence Similarity Based on Support Vector Regression using Multiple Features

This paper describes our system developed for English Monolingual subtask (STS Core) of SemEval-2016 Task 1: “Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation”. We measure the similarity between two sentences using three different types of features, including word alignment-based similarity, sentence vector-based similarity and sentence constituent similar...

متن کامل

DLS$@$CU: Sentence Similarity from Word Alignment and Semantic Vector Composition

We describe a set of top-performing systems at the SemEval 2015 English Semantic Textual Similarity (STS) task. Given two English sentences, each system outputs the degree of their semantic similarity. Our unsupervised system, which is based on word alignments across the two input sentences, ranked 5th among 73 submitted system runs with a mean correlation of 79.19% with human annotations. We a...

متن کامل

NORMAS at SemEval-2016 Task 1: SEMSIM: A Multi-Feature Approach to Semantic Text Similarity

This paper presents the submission of our team (NORMAS) to the SemEval 2016 semantic textual similarity (STS) shared task. We submitted three system runs, each using a set of 36 features extracted from the training set. The runs explore the use of the following three machine learning algorithms: Support Vector Regression, Elastic Net and Random Forest. Each run was trained using sentence pairs ...

متن کامل

Aicyber at SemEval-2016 Task 4: i-vector based sentence representation

This paper introduces aicyber’s systems for SemEval 2016 , Task 4A. The first system is build on vector space model (VSM), the second system is build on a new framework to estimate sentence vector, it is inspired by the i-vector in speaker verification domain. Both systems are evaluated on SemEval 2016 (Task4A) as well as IMDB dataset. Evaluation results show that the i-vector based sentence ve...

متن کامل

Amrita_CEN at SemEval-2016 Task 1: Semantic Relation from Word Embeddings in Higher Dimension

Semantic Textual Similarity measures similarity between pair of texts, even though the similar context is projected using different words. This work attempted to incorporate the context space of the sentence from that sentence alone. It proposes combination of Word2Vec and Non-Negative Matrix Factorization to represent the sentence as context embedding vector in context space. Distance and corr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016